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Abstract 

When we use the entropy method to get the tail bounds, typicahy the left tail 
bounds are not good comparing with the right ones. Up to now this asymmetry 
has been observed many times. Surprisingly we find an entropy method for the 
left tail that works in the resembling way that it works for the right tail. This 
new method does not work in all the cases. We provide a meaningful example. 

1 Introduction. 

In recent years, interesting developments took place in the analysis of the spectrum 
of large random matrices. In particular, the asymptotic distribution of the largest 
eigenvalue has been a subject of hot interest. 

Let X = (Xij) be an n X n complex hermitian matrix such that the entries Xij on 
and above diagonal are independent complex (real on the diagonal) centered normal 
random variables with variance 1. Let Ai > A2 > ■ • • > A„ be the n real eigenvalues of 
;^X. There have been many researches of the concentration of the largest eigenvalue 
Ai or the concentration of the k-th largest eigenvalue A^. Regarding the concentration 
of the k-th largest eigenvalue, we know of three results; Alon, Krivelevich, Vu (2002), 
Meckes (2004), Maurer (2006). Alon, Krivelevich, Vu (2002) and Meckes (2004) used 
Talagrand's method whereas Maurer (2006) used the entropy method. Since our main 
theme of this paper is the entropy method, we state Maurer 's concentration result. 

Theorem. [Maurer (2006)] Let X = (Xij) he an n x n real symmetric matrix such 
that the entries X^j on and above diagonal are independent with \Xij\ < 1. Let 
Ai > A2 > • ■ ■ > An be the n real eigenvalues of X. Then, for all A;, n > 1, and for all 

t > 0, 

P(A. - EX, > t) < exp (- Jl) . P(A. - EX, < -t) < exp (^j^r^) • d-D 
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The left tail bounds in (11. ip are larger than the right ones. This asymmetry usu- 
ally happens when we use the entropy method to get the tail bounds. However, this 
asymmetry is not observed in the works of Alon, Krivelevich, Vu (2002) and Meckes 
(2004) which are based on Talagrand's method. In these works the left tail bounds are 
same to the right ones. In addition, the centering is the median not the mean. This 
symmetry and the centering are typical with Talagrand's method. 

In this paper we found an entropy method for the left tail that works in the resem- 
bling way that it works for the right tail by controlling the term carefully (see (12. 9p 
in Section 2 for the definition of A^), and give a meaningful example. 

The rest of the paper is organized as follows. In Section 2, we develop an entropy 
method for the left tail. In Section 3, we apply this new method to the interesting case 
including the k-th largest eigenvalue. 

2 Entropy method for the left tail. 

The concentration of measure phenomenon for the product measures has been inves- 
tigated in depth by Talagrand (1995, 1996) in a most remarkable way. His method has 
been applied to various interesting cases. In many cases his method made new-record 
concentration inequalities and in some cases his method even produced non-trivial 
concentration inequalities for the first time. However, his method is technically too 
complicated. Hence many people tried to simplify his proof and studied to find an 
alternative to reproduce and more ambitiously to extend his result. One of the suc- 
cessful alternatives is the entropy method. Here we explain the minimum details of 
the entropy method to show our contributions on this interesting subject. See Ledoux 
(1996), Massart (2000), Boucheron, Lugosi, Massart (2000, 2003), Maurer (2006) for 
the full details. 

Let Xi, . . . , Xn be independent and let G = G{Xi, . . . , X„) > 0. Define the entropy 
H{G) and the partial entropy Hi^{G) by 



where E is the integration over Xi, . . . , Xn whereas Ek is the integration over X^ only. 
So, the entropy H{G) is a real number but the partial entropy Hk{G) is a random 
variable which does not depends on Xk. 



H{G) 
Hk{G) 



EG log G - EG log EG, 
EkGlogG-E.GlogEkG, 



Entropy method for the left tail 3 

Some classical formulas of the entropy are quite helpful; 

H{G) = sup EG{\ogT -log ET), (2.1) 

T 

H{G) = miEG{logG-logc)-{G~c), (2.2) 



where the supremum in (12. ip is taken over the strictly positive random variables T and 
where the infimum in (12. 2^ runs over the strictly positive constants c. (12. ip is called 
the duality formula of the entropy and (12.21) is called the variation formula. 

Here is the well-known entropy inequality (or tensorization inequality) which fol- 
lows from the duality formula (12. ip . 

Lemma 1. [Entropy inequality] 

n 

H{G)<Y,EHk{G). (2.3) 

k=l 

Now, let Z = Z{Xi, . . . ,Xn) be the random variable of interest. We apply the 
entropy inequality to the random variable e^^ . Then, we have 

n 

EXZe^^ - Ee^^logEe^^ < ^EHk{e^^). (2.4) 

k=l 

To estimate the term EHk{e^^), we apply the variation formula (12. 2p to the partial 
entropy Hk{e^^)] Hk^e^^) = infcEke^^ {XZ — logc) — [e^^ — c). Since the integration 
is only over X^, during the evaluation of the partial entropy Hk{e^^) we can treat 
all the other random variables Xj, 1 < j 7^ < ra, as fixed constants. So, in fact 
c can be chosen as a function of Xi, . . . , Xjt_i, Xjt+i, . . . ,X„, or even as a function 
of Xi, . . . , Xfc_i, X^, Xfe+i, . . . , X„, where X{. is an independent copy of Xk and X^ is 
independent to Xi, . . . , X„. This subtle point on c is crucial for the further development 
of the theory. If we choose a particular "constant" Cq to estimate the partial entropy 
Hk{e^^), then we have 

Hkie^"") < Eke^^iXZ - logco) - (e^^ - cq). (2.5) 

To get a good concentration inequality, we have to choose cq well-designed for the 
random variable Z of interest. 

There are many possible choices of Cq. Massart (2000) and Boucheron, Lugosi, 
Massart (2003) chose 

Co := exp (AZ(Xi, . . . , X^, . . . , X„)) := e^^", 
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where X{, is an independent copy of Xk- 
Boucheron, Lugosi, Massart (2000) chose 

Co := exp (xZ{Xi, . . . , X^., . . . , X„) 



) 



Here, Xk means that we drop out X^ from the argument of Z. In other word, we 
evaluate the value of Z based not {Xi, . . . but {Xi, . . . \ {Xk}. This is 

possible because of the special nature of the random variable Z they considered. 
Maurer (2006) chose 



where the infimum runs over all the possible values Xk which Xk can take as a function 
value or over a compact set containing the support of the distribution of Xk- He used 
this Co (or Z^) to get the right tail bound in Theorem A. He also use the same Zk to 
obtain the left tail bound in the same Theorem. 

In this paper we follow the footsteps of Maurer for the right tail bound. However, 
to get a better left tail bound we choose the following co = e^^* for the left tail bound; 



where the supremum runs over a compact set containing the support of the distribution 
of Xk- This choice does not always come with a sensible (see (12.91) below for 
the definition of A^). However, in many cases with this choice we do have A^ with 
i|A2||^ < oo. 

Let's recall what we have done so far with the entropy inequality. We first apply 
the entropy inequality to G = e^^ where Z is the random variable of interest. Then, 
the term EHk{e^^) appears in the inequality. To estimate the term EHk{e^^), with a 
particular choice co = e^^'' we apply the variation formula to Hk{e^^). Then, we get 
the following log-Sobolev inequality. 

Lemma 2. [Log-Sobolev inequality] If —X{Z — Zk) < for all k, then 




(2.6) 




(2.7) 



EXZe^^ - Ee^^logEe^^ < —Ee^^A"^, 



(2.8) 



where 



n 



A':=J2iZ-Zk) 



(2.9) 
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Proof. With a particular choice Cq = e^^'', from (I2.5P we have 
Hkie""^) < E,.e"^ (e-"(^-^'=) - (1 - A(Z - Zfc))) 



If —X{Z — Zk) < for all k, since (e^ — (1 + is an increasing function with the 

function value 1/2 at the trouble spot x = 0, and (hence) since (e^ — (1 + x))/x^ < 1/2 
for X < 0, we have then 

Hk{e'^)<^Eke'^{Z-Zkf. 
Plug this estimate into (12.41) and we get the log-Sobolev inequality (12.81) . | 

To distinguish our choice (12.71) from Maurer's choice (12. 6p . from now on we let 

n / \ 2 n 

Al := ^(z-infZ(Xi,...,Xfc,...,X„) := (z - 

fc=i V / k=i 

n / \ 2 n 

Al := J2lz-snpZ{X^,...,Xk,...,X„)) ■.= Y,{Z-Zi 

k=i V ^fe / k=i 

Here is our entropy method for the left tail, which is a simple consequence of the 
log-Sobolev inequality. 



Theorem 1. (i) If ||A^j||oo < oo, then for t > 



P(Z-EZ>t)<exp(- ^ „ ). (2.10) 



9IIA' „ 

II Af II oo 



ii) If ||A2 < oo, then for t > 



P{Z -EZ< -t) < exp (^-^^^ j . (2.11) 

Remark. As Maurer pointed out in private communication, ||A|^||oo 7^ l|A|||oo- 
However, in practice we don't know the exact values of ||A^^||oo and ||A|||oo- Instead 
we calculate the upper bounds of || A|^||oo and || A|||oo- In case || A|^||oo = || A|||oo < 00, 
(I2.10p and (12. lip provide the same left and right tail bounds. 

Proof. The right tail bound (12.101) is Theorem 1 of Maurer (2006). So, we can 
safely skip its proof. In fact, the left tail bound (12. lip also follows from the same 



Entropy method for the left tail 



6 



argument, the so-called Herbst's argument. For reader's convenience here we reproduce 
the Herbst's argument to get (12.111) . 

In this proof, we will use only the negative A < 0. Then, (since by our choice of 
zj.^\ Z — zl^^ < 0) we have —X{Z — Zk) < for all k. So, we can use the log-Sobolev 
inequality ([23]). Since ||A|||oo < oo, by (IXSD 



EXZe^^ - ^e^^logEe^^ < — 1| ||^Ee^^. 



A_2 
2 



Divide the both sides by X^Ee^^ . Then, we have 

rl ^ II A2 II 

-^^logEe^(^-^^)<^=^. 
dXX 2 

Recall A < 0. So, we integrate the both sides from A to 0. Since A~^ log Ee^^^~^^^ — 
as A ^ 0, we have then -A'Mog Ee^^^"-^^) < -||A2 ||^A/2 or 

EeMz-EZ) < f \\^Ioo^2\ ^2.12) 



Now, by Chebyshev's inequality with the choice A = — t/||A|||oo < we have the 
left tail bound fl2TTD : by fl212D . 

P{Z -EZ< -t) < e^*Ee^(^-^^) < exp (^Xt + MJkA^^ = exp ) • 

I 



3 Example. 

In this section, we apply the entropy method for the left tail (Theorem 1) to the 
eigenvalues of sample covariance matrix. In a near future we hope to see many more 
exciting examples. 

Let X = (Xij) he an n X N complex matrix with the independent entries Xij. 
Let Ai > A2 > ■ ■ • > A„ be the n positive eigenvalues of -^XX*. Then, under the 
suitable condition on the distribution of Xij the Marcenko-Pastur theorem (Marcenko 
and Pastur (1967)) says that as n — > 00, ^ cxd, n/N c(0 < c < 00), the empirical 
spectral distribution ^ J2k=i sample covariance matrix ;^XX* converges to 

the Marcenko-Pastur law. This time we use the Marcenko-Pastur scaling. For the 
sample covariance matrix we don't know any established concentration inequality to 
compare with. So, it is rather natural to work with the Marcenko-Pastur scaling. Here 
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is our result. 



Theorem 2. Let X = {Xy) he annx N complex matrix with the independent entries 
Xij, which are bounded by 1, i.e., \Xij\ < 1. Let Ai > A2 > • • • > A„ be the n positive 
eigenvalues of ;^XX*. Then, for all k, n,N > 1, and for all i > 0, 

P{Xk - EXk >t)< exp ( 1 , P{K - EXk < -t) < exp ( ) • 

Proof. Let Xt be the t-th column of X. To denote the dependency of the eigen- 
values on the matrix X, we let Ai(X) > A2(X) > ■■■ > A„(X) be the n positive 
eigenvalues of ^XX*. Fix 1 < k <n and let Z := Z{X} := Ajk(X) be the k-th largest 
eigenvalue of -^XX*. 

Fix 1 < to ^ From the given n x N matrix X delete the tc-th column X^,, and 
add Xtg where constant column vector of size n whose entries are all bounded 

by 1. Call this new n x N matrix as Y. Using this Y we define Z^^"^ by 

zW:=infZ(Y)=infZ(Y). (3.1) 

Let S'^ be an arbitrary /c-dimensional complex hnear subspace of C". By the 
Cour ant- Fischer representation theorem (look up Theorem 7.7 of Zhang (1999) for 
the Cour ant- Fischer representation theorem), 

Z(X) = — max min v*XX*v 
N 5* ve5'=,v*v=i 

1 f ^ \ 

— — max min v* > XtX? 1 v 

N Sk v6S'=,v*v=l \ ^ * / 



1 f ^ \ 
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Since iXu^l < 1 and since v*v = 1, we have 



Z(X)-Z(Y) < 



< 



1 n _ 2 

/=1 



— max 

N veC",v*v=i 



Vl\ 



< — 



n 
N' 



Take the infimum over xtg. Then, by the choice of Z^^^ given in (]3.ip we have 



0<Z-Zt,< 



n 

N' 



So, 



TV 



n2 / ^ 

iV' 



(3.2) 



J2iz-z,,f< 

to=l 

By (13 ■2p and by Theorem 1 (i) we have the right tail bound for Z = Xk- 

Now, we consider the left tail. When we choose Ztg, instead of taking the infimum 
this time we take the supremum. Define zj.^^'^ by 



Zi^^ :=supZ(Y) = supZ(Y). 



(3.3) 



Then, by the Courant-Fischer representation theorem we have 



Z{Y) - Z{X) < 



n 
N' 



Take the supremum over Xt^. Then, by the choice of zj:^^ given in (13.31) we have 



< Z 



to 



n 

Z <—. 

- N 



So, 



n 



Ai < —. 

^ - N 



(3.4) 



By (13. 4p and by Theorem 1 (ii), we have the left tail bound for Z = Xk. | 
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